Introduction

This package takes in data from bulk-Isoseq processed with and without cycloheximide (CHX), a nonsense mediated decay inhibitor (NMDi), and identifies isoforms and genes that are differentially expressed and/or undergoing NMD.

Prior to using this package, the following data processing needs to be performed on bulk Isoseq data:

Please view the Complete_workflow_example2.sh file to see a step-by-step usage of this package with dummy data.

The package outputs these following results based on hypothesis testing with chi-square tests within samples and between samples:

Isoforms

  • Hypothesis 1: isoforms expressed higher with CHX compared to non-CHX
  • Hypothesis 2: differentially expressed isoforms
  • Hypothesis 3: differentially expressed isoforms with CHX

Genes

  • Hypothesis 1: genes expressed higher with CHX compared to non-CHX
  • Hypothesis 2: differentially expressed genes
  • Hypothesis 3: differentially expressed genes with CHX
  • Hypothesis 5: genes with novel isoforms

Please note that hypothesis 4 is deprecated and is purposely skipped.

Hypothesis 1: Isoforms/Genes expressed higher with CHX compared to non-CHX

If we have an isoform/gene, and the isoform/gene contains a splicing variant that causes NMD, we should see that the isoform/gene has fewer mRNA transcripts because they should be degraded by NMD. If we add an NMDi like CHX, we stabilize the mRNA and we see more reads in the CHX treated sample.

Here are specifics regarding the statisical testing using a chi-square 2x2 table:

For example, isoforms in HARS1 is known to undergo alternative splicing which results in NMD in UDN212054 As a result, treatment with CHX causes HARS1 isoforms to show up as significant under hypothesis 1. However, EDN1 was not previously known to harbor variants that result in NMD in this patient, so EDN1 is now a potential candidate for evaluating the genetic cause.

UDN212054
Isoform_PBid Sample associated_gene P_Value_Hyp1 Cyclo_TPM Noncyclo_TPM NormalizedFractionDifference Phenotypes PhenotypesNotEmpty
PB.33489.130 UDN212054 EDN1 0.0443832 1.074748 0.000000 1.0000000 {High density lipoprotein cholesterol level QTL 7} (3); Question mark ears, isolated, 612798 (3), Autosomal dominant; Auriculocondylar syndrome 3, 615706 (3), Autosomal recessive TRUE
PB.33489.108 UDN212054 EDN1 0.0000000 834.004603 9.369151 0.9777817 {High density lipoprotein cholesterol level QTL 7} (3); Question mark ears, isolated, 612798 (3), Autosomal dominant; Auriculocondylar syndrome 3, 615706 (3), Autosomal recessive TRUE
PB.31662.153 UDN212054 HARS1 0.0000000 13.255228 0.000000 1.0000000 Charcot-Marie-Tooth disease, axonal, type 2W, 616625 (3), Autosomal dominant; Usher syndrome type 3B, 614504 (3), Autosomal recessive TRUE

Hypothesis 2: Differentially expressed isoforms/genes

For hypothesis 2, we are only looking at the reads from the noncyclo samples.

You might be thinking, why are we even looking at noncyclo reads in isolation. Don’t we need the cyclo reads to say anything about NMD? And you’re totally right, we would need the cyclo reads to study NMD. However, NMD isn’t the only way transcripts can be affected by genetic variants. Transcripts can also just have lower expression due to regulatory variants for example. So we wanted a hypothesis that can also allow us to find transcripts that have lower or higher expression due to things other than NMD. Which is why we came up with hypothesis 2, which allows us to find isoforms that have lower/higher expression regardless of NMD.

Here are specifics regarding the statisical testing using a chi-square 2x2 table:

For example, VTA1 is known to have decreased expression in UDN687128. Furthermore, the decrease in expression is allele-specific and not related to NMD.

UDN687128
Sample associated_gene Max_P_Value_Hyp2_below_median Cyclo_TPM Noncyclo_TPM Noncyclo_Z_Score Phenotypes PhenotypesNotEmpty
UDN687128 VTA1 0 97.95027 81.20662 -2.538116 FALSE

Hypothesis 3: Differentially expressed isoforms/genes with CHX

Hypothesis 3 is very similar to hypothesis 2, but in hypothesis 3, we are using just the cyclo samples. Hypothesis 3 allows us to pick out isoforms/genes that are differentially expressed when treated with cycloheximide.

Here are specifics regarding the statisical testing using a chi-square 2x2 table:

For example, SET isoforms are known to have splicing variants that result in NMD in UDN215640. UDN215640 also has SET isoforms with splicing variants that do not undergo NMD, as a result, you can notice that there are some isoforms that are highlighted in SET where the cyclo TPM is not greater than the noncyclo TPM. While these isoforms are shown here as being captured by hypothesis 3, they are also captured by hypothesis 2.

UDN215640
Isoform_PBid Sample associated_gene Max_P_Value_Hyp3_above_median Cyclo_TPM Noncyclo_TPM Cyclo_Z_Score Phenotypes PhenotypesNotEmpty
PB.51843.120 UDN215640 SET 0.0000000 18.292148 27.6827805 3.327809 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.135 UDN215640 SET 0.0009002 2.918960 5.9320244 3.315860 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.335 UDN215640 SET 0.0238786 1.556779 0.8987916 3.312667 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.116 UDN215640 SET 0.0000000 11.870436 14.2009069 3.326596 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.58 UDN215640 SET 0.0005698 3.113557 2.8761330 3.328201 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.67 UDN215640 SET 0.0000378 4.281141 7.9093659 3.328201 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.132 UDN215640 SET 0.0388493 1.362181 0.0000000 3.328201 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE
PB.51843.355 UDN215640 SET 0.0147725 1.751376 1.9773415 3.328201 Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant TRUE

Hypothesis 5: Genes with novel isoforms

Instead of looking at isoforms individually like we did in hypothesis 1, we wanted to have a method of grouping novel isoforms together. This way, we increase the statistical power to determine whether novel isoforms in a gene is undergoing NMD. We make the assumption that isoforms that have low/no abundance in the noncyclo sample are isoforms that are “novel” and likely ones that underwent NMD. We want to group these together into a single bin so that we have more power for statistical testing.

Here is a diagram illustrating the binning process:

Here are specifics regarding the statisical testing using a chi-square 2x2 table:

For example, MFN2 is known to have several isoforms that have alternative splicing and results in NMD. Individually, these novel isoforms have low read counts. However, when combined together at the gene level, we are able to pick out MFN2 as a gene containing novel isoforms that undergo NMD.

UDN633333
Sample associated_gene P_Value_Hyp5 proportion_in_Bin1_cyclo proportion_in_Bin1_noncyclo Cyclo_TPM Noncyclo_TPM NormalizedFractionDifference Phenotypes P_Value_Hyp1 PhenotypesNotEmpty
UDN633333 MFN2 0 0.4 0 25.17233 24.84468 0.0065509 Lipomatosis, multiple symmetric, with or without peripheral neuropathy, 151800 (3), Autosomal recessive; Charcot-Marie-Tooth disease, axonal, type 2A2A, 609260 (3), Autosomal dominant; Charcot-Marie-Tooth disease, axonal, type 2A2B, 617087 (3), Autosomal recessive; Hereditary motor and sensory neuropathy VIA, 601152 (3), Autosomal dominant 0.9683644 TRUE

Validating effects of NMD

To validate the effects of NMD, PCA was performed on the gene-level (left) and isoform-level (right) expression matrix for the 13 patients (26 samples) included in the analysis.

Acknowledgements

This package is developed during my time at the University of Washington as a Genome Sciences graduate student in the Stergachis Lab. Acknowledgements to: